Add managed-memory advise, prefetch, and discard-prefetch free functions by rparolin · Pull Request #1775 · NVIDIA/cuda-python

rparolin · 2026-03-17T00:38:04Z

Summary

Add managed-memory advise(), prefetch(), and discard_prefetch() as free functions under the new cuda.core.managed_memory namespace, wrapping the CUDA driver APIs cuMemAdvise, cuMemPrefetchAsync, and cuMemDiscardAndPrefetchBatchAsync.

Closes #1332

Details

New public API — cuda.core.managed_memory module with three functions:

advise(target, advice, location, *, size, location_type) — apply managed-memory advice to a range
prefetch(target, location, *, stream, size, location_type) — prefetch a range to a target location
discard_prefetch(target, location, *, stream, size, location_type) — discard and prefetch a range

Each function accepts either a Buffer (size inferred) or a raw pointer (requires size=). Location can be specified as a Device, int ordinal, -1 for host, or with an explicit location_type ("device", "host", "host_numa", "host_numa_current"). Advice can be a CUmem_advise enum value or a string alias like "set_read_mostly". The stream parameter on prefetch and discard_prefetch also accepts a GraphBuilder.

Location validation matches the CUDA driver spec:

set_read_mostly, unset_read_mostly, unset_preferred_location — location is optional; allowed types are device, host, host_numa
set_preferred_location — all four location types valid
set_accessed_by, unset_accessed_by — only device and host (rejects host_numa and host_numa_current)

Backward compatibility — when cuda.bindings < 13.0, the functions fall back to the legacy cuMemAdvise(ptr, size, advice, device_int) / cuMemPrefetchAsync(ptr, size, device_int, stream) signatures. Enum lookups for the legacy path are cached to avoid repeated hasattr/getattr calls.

Implementation notes:

All managed-memory helpers, validation logic, and public API functions live in a dedicated _managed_memory_ops.pyx module under cuda.core._memory
_buffer.pxd exposes _init_mem_attrs, _query_memory_attrs, and the _MemAttrs struct (with a new is_managed field) for use by the ops module
_normalize_managed_location handles all location inference and constraint checking; each branch returns directly with no dead fallthrough code
Managed-memory detection uses cuPointerGetAttributes (the existing _MemAttrs infrastructure)
The public cuda.core.managed_memory module re-exports the three functions from the Cython implementation
Backward compatibility shim also registered under cuda.core.experimental.managed_memory

Tests

Adds coverage for:

advise/prefetch/discard_prefetch on managed-memory pool buffers and externally wrapped managed allocations
advise with CUmem_advise enum values (not just string aliases)
Location validation: all four location types for set_preferred_location; host_numa/host_numa_current rejection for set_accessed_by
Inferred location from int (-1 → host, 0 → device)
prefetch with location=None raises ValueError
size= rejection when target is a Buffer (TypeError)
Invalid advice: bad string and wrong type
Rejection on non-managed buffers
Legacy bindings path (monkeypatched get_binding_version)
Raw pointer ranges with explicit size=
Observable driver-side effects via cuMemRangeGetAttribute

copy-pr-bot · 2026-03-17T00:38:08Z

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

github-actions · 2026-03-17T01:07:52Z

Doc Preview CI
🚀 View preview at https://nvidia.github.io/cuda-python/pr-preview/pr-1775/
https://nvidia.github.io/cuda-python/pr-preview/pr-1775/cuda-core/
https://nvidia.github.io/cuda-python/pr-preview/pr-1775/cuda-bindings/
https://nvidia.github.io/cuda-python/pr-preview/pr-1775/cuda-pathfinder/
Preview will be ready when the GitHub Pages deployment is complete.

rparolin · 2026-03-17T16:42:11Z

/ok to test

jrhemstad · 2026-03-17T16:50:19Z

question: Does making these member functions of the Buffer type preclude this functionality for allocations that weren't created through the Buffer type? Did we consider making these free functions instead of member functions on the Buffer type?

rparolin · 2026-03-17T19:35:28Z

question: Does making these member functions of the Buffer type preclude this functionality for allocations that weren't created through the Buffer type? Did we consider making these free functions instead of member functions on the Buffer type?

I'm moving this back into draft. We discussed in our team meeting because I was already hesitant as Buffer is becoming a 'God object' with the functionality is gaining. We were going to explore alternatives. Free functions sounds like a good alternative to explore.

…ns in the cuda.core.managed_memory namespace

…ups, fix docs - Remove duplicate long-form "cu_mem_advise_*" string aliases from _MANAGED_ADVICE_ALIASES; users pass short strings or the enum directly - Replace 4 boolean allow_* params in _normalize_managed_location with a single allowed_loctypes frozenset driven by _MANAGED_ADVICE_ALLOWED_LOCTYPES - Cache immutable runtime checks: CU_DEVICE_CPU, v2 bindings flag, discard_prefetch support, and advice enum-to-alias reverse map - Collapse hasattr+getattr to single getattr in _managed_location_enum - Move _require_managed_discard_prefetch_support to top of discard_prefetch for fail-fast behavior - Fix docs build: reset Sphinx module scope after managed_memory section in api.rst so subsequent sections resolve under cuda.core - Add discard_prefetch pool-allocation test and comment on _get_mem_range_attr Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…e legacy path The _V2_BINDINGS cache in _buffer.pyx persists across tests, so monkeypatching get_binding_version alone is insufficient when earlier tests have already populated the cache with the v2 value. Promote _V2_BINDINGS from cdef int to a Python-level variable so tests can monkeypatch it directly via monkeypatch.setattr, and reset it to -1 in both legacy-signature tests. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…t real hardware These three tests call cuMemAdvise on real CUDA devices and verify memory range attributes. On devices without concurrent_managed_access (e.g. Windows/WDDM), set_read_mostly silently no-ops and set_preferred_location fails with CUDA_ERROR_INVALID_DEVICE. Use the stricter _skip_if_managed_location_ops_unsupported guard, matching the pattern already used by test_managed_memory_functions_accept_raw_pointer_ranges. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…s support Reorder checks in discard_prefetch so _normalize_managed_target_range runs before _require_managed_discard_prefetch_support. This ensures non-managed buffers raise ValueError before the RuntimeError for missing cuMemDiscardAndPrefetchBatchAsync support. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…ps module Move advise, prefetch, and discard_prefetch functions and their helpers out of _buffer.pyx into a new _managed_memory_ops Cython module to improve separation of concerns. Expose _init_mem_attrs and _query_memory_attrs as non-inline cdef functions in _buffer.pxd so the new module can reuse them. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Andy-Jost

I'm not against merging this change, but I think we need to revisit the Buffer design and seriously consider alternatives.

The main concern I have is that Buffer being flat leads us to create non-Pythonic free functions, such as the following:

managed_memory.advise(buffer, "set_preferred_location", 3, location_type="host_numa")

A more Pythonic interface would put managed-memory-specific features into a subclass; something like the following:

class ManagedBuffer(Buffer):
    preferred_location: Device | Host | NumaNode | None
    read_mostly: bool
    accessed_by: set[Device | Host]
    def prefetch(self, location, *, stream): ...
    def discard_prefetch(self, location, *, stream): ...

Usage would be something like:

buf.preferred_location = Device(0)       # set_preferred_location to device
buf.preferred_location = Host()          # set_preferred_location to host
buf.preferred_location = Host(numa_id=3) # set_preferred_location to NUMA node
buf.preferred_location = None            # unset_preferred_location
buf.read_mostly = True                   # set_read_mostly
buf.read_mostly = False                  # unset_read_mostly
buf.accessed_by.add(Device(0))           # set_accessed_by
buf.accessed_by.discard(Device(0))       # unset_accessed_by
buf.prefetch(Device(0), stream=s)        # prefetch to device
buf.prefetch(Host(), stream=s)           # prefetch to host

cuda_core/cuda/core/_memory/_managed_memory_ops.pxd

Andy-Jost · 2026-03-19T16:32:46Z

cuda_core/cuda/core/_memory/_buffer.pxd

 cdef struct _MemAttrs:
    int device_id
    bint is_device_accessible
    bint is_host_accessible
+    bint is_managed


 cdef class Buffer:
    cdef:
        DevicePtrHandle _h_ptr
        size_t          _size
        MemoryResource  _memory_resource
        object          _ipc_data
        object          _owner
        _MemAttrs       _mem_attrs
        bint            _mem_attrs_inited
        object          __weakref__


Not directly related to your change, but I think Buffer is too complicated. We should revisit the design.

Andy-Jost · 2026-03-19T16:37:10Z

cuda_core/cuda/core/_memory/_buffer.pyx

            Device()
        ret = cydriver.cuPointerGetAttributes(3, attrs, <void**>vals, ptr)
    HANDLE_RETURN(ret)



Also unrelated: I don't think we auto-init CUDA anywhere else and I don't think the code should be this defensive.

cuda_core/cuda/core/_memory/_managed_memory_ops.pyx

Andy-Jost · 2026-03-19T17:04:19Z

cuda_core/tests/test_memory.py

nit: we should refactor memory tests to a subdirectory: tests/memory/test_managed_ops.py and siblings.

rparolin · 2026-03-19T20:31:00Z

@leofang Can you be a tie breaker here? Do you feel that these APIs should have an object orientated style and live on a ManagedBuffer class? Or is a free-standing function implementation acceptable?

cc: @Andy-Jost

wip

abdec47

rparolin requested a review from Andy-Jost March 17, 2026 00:41

rparolin self-assigned this Mar 17, 2026

rparolin added this to the cuda.core v0.7.0 milestone Mar 17, 2026

wip

c418050

rparolin marked this pull request as ready for review March 17, 2026 00:45

rparolin marked this pull request as draft March 17, 2026 00:45

rparolin changed the title ~~wip~~ Add managed-memory advise, prefetch, and discard-prefetch on Buffer Mar 17, 2026

fixing ci compiler errors

b879fa5

rparolin marked this pull request as ready for review March 17, 2026 00:57

rparolin added 2 commits March 17, 2026 09:07

skipping tests that aren't supported

04ee3de

cu12 support

9ab3f46

Merge branch 'main' into rparolin/managed_mem_advise_prefetch

bd75bc3

rparolin marked this pull request as draft March 17, 2026 19:35

rparolin added 3 commits March 17, 2026 12:37

Merge branch 'main' into rparolin/managed_mem_advise_prefetch

1b1343b

Moving to function from Buffer class methods to free standing functio…

a948066

…ns in the cuda.core.managed_memory namespace

precommit format

1457599

rparolin marked this pull request as ready for review March 17, 2026 23:46

rparolin and others added 7 commits March 17, 2026 17:30

iterating on implementation

acb4024

Merge branch 'main' into rparolin/managed_mem_advise_prefetch

ae1de36

pre-commit fix

90f0711

rparolin changed the title ~~Add managed-memory advise, prefetch, and discard-prefetch on Buffer~~ Add managed-memory advise, prefetch, and discard-prefetch free functions Mar 18, 2026

Andy-Jost approved these changes Mar 19, 2026

View reviewed changes

rparolin added 2 commits March 19, 2026 11:07

Removing blank file

b4d252c

wip

faaa1d8

rparolin requested a review from leofang March 19, 2026 20:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add managed-memory advise, prefetch, and discard-prefetch free functions#1775

Add managed-memory advise, prefetch, and discard-prefetch free functions#1775
rparolin wants to merge 19 commits intoNVIDIA:mainfrom
rparolin:rparolin/managed_mem_advise_prefetch

rparolin commented Mar 17, 2026 •

edited

Loading

Uh oh!

copy-pr-bot bot commented Mar 17, 2026

Uh oh!

github-actions bot commented Mar 17, 2026

Preview will be ready when the GitHub Pages deployment is complete.

Uh oh!

rparolin commented Mar 17, 2026

Uh oh!

jrhemstad commented Mar 17, 2026

Uh oh!

rparolin commented Mar 17, 2026 •

edited

Loading

Uh oh!

Andy-Jost left a comment •

edited

Loading

Uh oh!

Uh oh!

Andy-Jost Mar 19, 2026

Uh oh!

Andy-Jost Mar 19, 2026

Uh oh!

Uh oh!

Andy-Jost Mar 19, 2026

Uh oh!

rparolin commented Mar 19, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

rparolin commented Mar 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Details

Tests

Uh oh!

copy-pr-bot bot commented Mar 17, 2026

Uh oh!

github-actions bot commented Mar 17, 2026

Preview will be ready when the GitHub Pages deployment is complete.

Uh oh!

rparolin commented Mar 17, 2026

Uh oh!

jrhemstad commented Mar 17, 2026

Uh oh!

rparolin commented Mar 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Andy-Jost left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Andy-Jost Mar 19, 2026

Choose a reason for hiding this comment

Uh oh!

Andy-Jost Mar 19, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Andy-Jost Mar 19, 2026

Choose a reason for hiding this comment

Uh oh!

rparolin commented Mar 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

rparolin commented Mar 17, 2026 •

edited

Loading

rparolin commented Mar 17, 2026 •

edited

Loading

Andy-Jost left a comment •

edited

Loading

rparolin commented Mar 19, 2026 •

edited

Loading